<!DOCTYPE html>
<html >

<head>

  <meta charset="UTF-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <title>《属性数据分析》代码</title>
  <meta name="description" content="《属性数据分析》代码。">
  <meta name="generator" content="bookdown 0.7 and GitBook 2.6.7">

  <meta property="og:title" content="《属性数据分析》代码" />
  <meta property="og:type" content="book" />
  
  
  <meta property="og:description" content="《属性数据分析》代码。" />
  

  <meta name="twitter:card" content="summary" />
  <meta name="twitter:title" content="《属性数据分析》代码" />
  
  <meta name="twitter:description" content="《属性数据分析》代码。" />
  



<meta name="date" content="2018-09-05">

  <meta name="viewport" content="width=device-width, initial-scale=1">
  <meta name="apple-mobile-web-app-capable" content="yes">
  <meta name="apple-mobile-web-app-status-bar-style" content="black">
  
  
<link rel="prev" href="contingency-table.html">
<link rel="next" href="logistic-regression.html">
<script src="libs/jquery/jquery.min.js"></script>
<link href="libs/gitbook/css/style.css" rel="stylesheet" />
<link href="libs/gitbook/css/plugin-bookdown.css" rel="stylesheet" />
<link href="libs/gitbook/css/plugin-highlight.css" rel="stylesheet" />
<link href="libs/gitbook/css/plugin-search.css" rel="stylesheet" />
<link href="libs/gitbook/css/plugin-fontsettings.css" rel="stylesheet" />









<style type="text/css">
a.sourceLine { display: inline-block; line-height: 1.25; }
a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
a.sourceLine:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode { white-space: pre; position: relative; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
code.sourceCode { white-space: pre-wrap; }
a.sourceLine { text-indent: -1em; padding-left: 1em; }
}
pre.numberSource a.sourceLine
  { position: relative; left: -4em; }
pre.numberSource a.sourceLine::before
  { content: attr(data-line-number);
    position: relative; left: -1em; text-align: right; vertical-align: baseline;
    border: none; pointer-events: all; display: inline-block;
    -webkit-touch-callout: none; -webkit-user-select: none;
    -khtml-user-select: none; -moz-user-select: none;
    -ms-user-select: none; user-select: none;
    padding: 0 4px; width: 4em;
    color: #aaaaaa;
  }
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
div.sourceCode
  {  }
@media screen {
a.sourceLine::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
</style>

<link rel="stylesheet" href="css/style.css" type="text/css" />
</head>

<body>



  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">

    <div class="book-summary">
      <nav role="navigation">

<ul class="summary">
<li><a href="./index.html">《属性数据分析》代码</a></li>

<li class="divider"></li>
<li class="chapter" data-level="" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i>前言</a></li>
<li class="chapter" data-level="1" data-path="intro.html"><a href="intro.html"><i class="fa fa-check"></i><b>1</b> 导言</a><ul>
<li class="chapter" data-level="1.1" data-path="intro.html"><a href="intro.html#data-intro"><i class="fa fa-check"></i><b>1.1</b> 属性响应数据</a></li>
<li class="chapter" data-level="1.2" data-path="intro.html"><a href="intro.html#prob-dist"><i class="fa fa-check"></i><b>1.2</b> 属性数据的概率分布</a><ul>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#二项分布计算"><i class="fa fa-check"></i>二项分布计算</a></li>
</ul></li>
<li class="chapter" data-level="1.3" data-path="intro.html"><a href="intro.html#stat-infer"><i class="fa fa-check"></i><b>1.3</b> 比例的统计推断</a><ul>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#二项分布似然函数图"><i class="fa fa-check"></i>二项分布似然函数图</a></li>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#二项分布假设检验"><i class="fa fa-check"></i>二项分布假设检验</a></li>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#二项分布置信区间"><i class="fa fa-check"></i>二项分布置信区间</a></li>
</ul></li>
<li class="chapter" data-level="1.4" data-path="intro.html"><a href="intro.html#more-stat-infer"><i class="fa fa-check"></i><b>1.4</b> 关于离散数据的更多统计推断</a><ul>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#二项分布参数统计推断"><i class="fa fa-check"></i>二项分布参数统计推断</a></li>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#小样本推断"><i class="fa fa-check"></i>小样本推断</a></li>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#小样本推断p值调整"><i class="fa fa-check"></i>小样本推断P值调整</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#problems-ch1"><i class="fa fa-check"></i>课后题</a><ul>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#第4题"><i class="fa fa-check"></i>第4题</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="2" data-path="contingency-table.html"><a href="contingency-table.html"><i class="fa fa-check"></i><b>2</b> 列联表</a><ul>
<li class="chapter" data-level="2.1" data-path="contingency-table.html"><a href="contingency-table.html#stucture"><i class="fa fa-check"></i><b>2.1</b> 列联表的概率结构</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#关于来世"><i class="fa fa-check"></i>关于来世</a></li>
</ul></li>
<li class="chapter" data-level="2.2" data-path="contingency-table.html"><a href="contingency-table.html#prop-compare"><i class="fa fa-check"></i><b>2.2</b> 2×2表比例的比较</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#阿司匹林与心脏病列联表检验"><i class="fa fa-check"></i>阿司匹林与心脏病（列联表检验）</a></li>
</ul></li>
<li class="chapter" data-level="2.3" data-path="contingency-table.html"><a href="contingency-table.html#odds-ratio"><i class="fa fa-check"></i><b>2.3</b> 优势比</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#阿司匹林与心脏病优势比"><i class="fa fa-check"></i>阿司匹林与心脏病（优势比）</a></li>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#吸烟状态与心肌梗死"><i class="fa fa-check"></i>吸烟状态与心肌梗死</a></li>
</ul></li>
<li class="chapter" data-level="2.4" data-path="contingency-table.html"><a href="contingency-table.html#chi-square-test"><i class="fa fa-check"></i><b>2.4</b> 独立性的卡方检验</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#性别和党派认同"><i class="fa fa-check"></i>性别和党派认同</a></li>
</ul></li>
<li class="chapter" data-level="2.5" data-path="contingency-table.html"><a href="contingency-table.html#indenpendence-test-for-ordinal-data"><i class="fa fa-check"></i><b>2.5</b> 有序数据的独立性检验</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#饮酒与婴儿畸形"><i class="fa fa-check"></i>饮酒与婴儿畸形</a></li>
</ul></li>
<li class="chapter" data-level="2.6" data-path="contingency-table.html"><a href="contingency-table.html#exact-test-for-small-sample"><i class="fa fa-check"></i><b>2.6</b> 小样本的精确推断</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#女士品茶"><i class="fa fa-check"></i>女士品茶</a></li>
</ul></li>
<li class="chapter" data-level="2.7" data-path="contingency-table.html"><a href="contingency-table.html#three-way-table"><i class="fa fa-check"></i><b>2.7</b> 三项列联表的关联性</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#死刑判决案例"><i class="fa fa-check"></i>死刑判决案例</a></li>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#临床试验"><i class="fa fa-check"></i>临床试验</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#problems-ch2"><i class="fa fa-check"></i>课后题</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#第18题"><i class="fa fa-check"></i>第18题</a></li>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#第22题"><i class="fa fa-check"></i>第22题</a></li>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#第33题"><i class="fa fa-check"></i>第33题</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="3" data-path="glm.html"><a href="glm.html"><i class="fa fa-check"></i><b>3</b> 广义线性模型</a><ul>
<li class="chapter" data-level="3.1" data-path="glm.html"><a href="glm.html#components-of-glm"><i class="fa fa-check"></i><b>3.1</b> 广义线性模型的构成部分</a></li>
<li class="chapter" data-level="3.2" data-path="glm.html"><a href="glm.html#glm-for-binary-data"><i class="fa fa-check"></i><b>3.2</b> 二分数据的广义线性模型</a><ul>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#打鼾与心脏病"><i class="fa fa-check"></i>打鼾与心脏病</a></li>
</ul></li>
<li class="chapter" data-level="3.3" data-path="glm.html"><a href="glm.html#glm-for-count-data"><i class="fa fa-check"></i><b>3.3</b> 计数数据的广义线性模型</a><ul>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#母鲎及其追随者泊松glm"><i class="fa fa-check"></i>母鲎及其追随者（泊松GLM）</a></li>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#母鲎及其追随者负二项glm"><i class="fa fa-check"></i>母鲎及其追随者（负二项GLM）</a></li>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#英国的火车事故"><i class="fa fa-check"></i>英国的火车事故</a></li>
</ul></li>
<li class="chapter" data-level="3.4" data-path="glm.html"><a href="glm.html#stat-infer-glm"><i class="fa fa-check"></i><b>3.4</b> 统计推断和模型检验</a><ul>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#打鼾与心脏病-1"><i class="fa fa-check"></i>打鼾与心脏病</a></li>
</ul></li>
<li class="chapter" data-level="3.5" data-path="glm.html"><a href="glm.html#fit-glm"><i class="fa fa-check"></i><b>3.5</b> 广义线性模型的拟合</a></li>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#problems-ch3"><i class="fa fa-check"></i>课后题</a><ul>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#第3题"><i class="fa fa-check"></i>第3题</a></li>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#第4题-1"><i class="fa fa-check"></i>第4题</a></li>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#第7题"><i class="fa fa-check"></i>第7题</a></li>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#第20题"><i class="fa fa-check"></i>第20题</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="4" data-path="logistic-regression.html"><a href="logistic-regression.html"><i class="fa fa-check"></i><b>4</b> logistic回归</a><ul>
<li class="chapter" data-level="4.1" data-path="logistic-regression.html"><a href="logistic-regression.html#interpret-logistic"><i class="fa fa-check"></i><b>4.1</b> logistic回归模型的解释</a><ul>
<li class="chapter" data-level="" data-path="logistic-regression.html"><a href="logistic-regression.html#母鲎及其追随者logistic回归"><i class="fa fa-check"></i>母鲎及其追随者（logistic回归）</a></li>
</ul></li>
<li class="chapter" data-level="4.2" data-path="logistic-regression.html"><a href="logistic-regression.html#infer-logistic"><i class="fa fa-check"></i><b>4.2</b> logistic回归的推断</a></li>
<li class="chapter" data-level="4.3" data-path="logistic-regression.html"><a href="logistic-regression.html#cate-var-logistic"><i class="fa fa-check"></i><b>4.3</b> 属性预测变量的logistic回归</a><ul>
<li class="chapter" data-level="" data-path="logistic-regression.html"><a href="logistic-regression.html#azt和aids"><i class="fa fa-check"></i>AZT和AIDS</a></li>
</ul></li>
<li class="chapter" data-level="4.4" data-path="logistic-regression.html"><a href="logistic-regression.html#multi-logistic"><i class="fa fa-check"></i><b>4.4</b> 多元logistic回归</a><ul>
<li class="chapter" data-level="" data-path="logistic-regression.html"><a href="logistic-regression.html#母鲎及其追随者多元logistic"><i class="fa fa-check"></i>母鲎及其追随者（多元logistic）</a></li>
</ul></li>
<li class="chapter" data-level="4.5" data-path="logistic-regression.html"><a href="logistic-regression.html#logistic回归效应的概括"><i class="fa fa-check"></i><b>4.5</b> logistic回归效应的概括</a></li>
<li class="chapter" data-level="" data-path="logistic-regression.html"><a href="logistic-regression.html#problem-ch4"><i class="fa fa-check"></i>课后题</a><ul>
<li class="chapter" data-level="" data-path="logistic-regression.html"><a href="logistic-regression.html#第8题"><i class="fa fa-check"></i>第8题</a></li>
<li class="chapter" data-level="" data-path="logistic-regression.html"><a href="logistic-regression.html#第24题"><i class="fa fa-check"></i>第24题</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="5" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html"><i class="fa fa-check"></i><b>5</b> logistic回归模型的构建和应用</a><ul>
<li class="chapter" data-level="5.1" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#model-selection"><i class="fa fa-check"></i><b>5.1</b> 模型选择策略</a><ul>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#母鲎及其追随者模型选择"><i class="fa fa-check"></i>母鲎及其追随者（模型选择）</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#母鲎及其追随者预测功效"><i class="fa fa-check"></i>母鲎及其追随者（预测功效）</a></li>
</ul></li>
<li class="chapter" data-level="5.2" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#model-checking"><i class="fa fa-check"></i><b>5.2</b> 模型检验</a><ul>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#母鲎及其追随者模型lr检验"><i class="fa fa-check"></i>母鲎及其追随者（模型LR检验）</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#azt和aids拟合优度"><i class="fa fa-check"></i>AZT和AIDS（拟合优度）</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#母鲎及其追随者hm检验"><i class="fa fa-check"></i>母鲎及其追随者（HM检验）</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#佛罗里达大学研究生入学"><i class="fa fa-check"></i>佛罗里达大学研究生入学</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#心脏病与血压的关系"><i class="fa fa-check"></i>心脏病与血压的关系</a></li>
</ul></li>
<li class="chapter" data-level="5.3" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#sparse-data-logistic"><i class="fa fa-check"></i><b>5.3</b> 稀疏数据效应</a><ul>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#稀疏数据的临床试验结果"><i class="fa fa-check"></i>稀疏数据的临床试验结果</a></li>
</ul></li>
<li class="chapter" data-level="5.4" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#conditional-logistic"><i class="fa fa-check"></i><b>5.4</b> 条件logistic回归与精确推断</a><ul>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#晋升能力"><i class="fa fa-check"></i>晋升能力</a></li>
</ul></li>
<li class="chapter" data-level="5.5" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#logistic-sample-num"><i class="fa fa-check"></i><b>5.5</b> logistic回归的样本量与功效</a><ul>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#样本量计算"><i class="fa fa-check"></i>样本量计算</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#problem-ch5"><i class="fa fa-check"></i>课后题</a><ul>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#第10题"><i class="fa fa-check"></i>第10题</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#第18题-1"><i class="fa fa-check"></i>第18题</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#第28题"><i class="fa fa-check"></i>第28题</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="6" data-path="multi-logit-model.html"><a href="multi-logit-model.html"><i class="fa fa-check"></i><b>6</b> 多类别logit模型</a><ul>
<li class="chapter" data-level="6.1" data-path="multi-logit-model.html"><a href="multi-logit-model.html#nomial-logit"><i class="fa fa-check"></i><b>6.1</b> 名义响应变量的logit模型</a><ul>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#钝吻鳄食物选择"><i class="fa fa-check"></i>钝吻鳄食物选择</a></li>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#是否相信来世"><i class="fa fa-check"></i>是否相信来世</a></li>
</ul></li>
<li class="chapter" data-level="6.2" data-path="multi-logit-model.html"><a href="multi-logit-model.html#ordinal-logit"><i class="fa fa-check"></i><b>6.2</b> 有序响应变量的累积logit模型</a><ul>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#政治意识形态和隶属党派的关系"><i class="fa fa-check"></i>政治意识形态和隶属党派的关系</a></li>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#对心理健康建模"><i class="fa fa-check"></i>对心理健康建模</a></li>
</ul></li>
<li class="chapter" data-level="6.3" data-path="multi-logit-model.html"><a href="multi-logit-model.html#paired-ordinal-logit"><i class="fa fa-check"></i><b>6.3</b> 成对类别有序logit</a><ul>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#再访政治意识形态"><i class="fa fa-check"></i>再访政治意识形态</a></li>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#发育毒性研究"><i class="fa fa-check"></i>发育毒性研究</a></li>
</ul></li>
<li class="chapter" data-level="6.4" data-path="multi-logit-model.html"><a href="multi-logit-model.html#conditional-independent"><i class="fa fa-check"></i><b>6.4</b> 条件独立性检验</a><ul>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#工作满意度和收入"><i class="fa fa-check"></i>工作满意度和收入</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#ch6-problems"><i class="fa fa-check"></i>课后题</a></li>
</ul></li>
<li class="appendix"><span><b>附录</b></span></li>
<li class="chapter" data-level="A" data-path="r-pkg-intro.html"><a href="r-pkg-intro.html"><i class="fa fa-check"></i><b>A</b> 配套R包使用介绍</a><ul>
<li class="chapter" data-level="A.1" data-path="r-pkg-intro.html"><a href="r-pkg-intro.html#r-pkg-install"><i class="fa fa-check"></i><b>A.1</b> 安装</a></li>
<li class="chapter" data-level="A.2" data-path="r-pkg-intro.html"><a href="r-pkg-intro.html#r-pkg-use"><i class="fa fa-check"></i><b>A.2</b> 使用说明</a></li>
</ul></li>
<li class="chapter" data-level="B" data-path="book-dataset-list.html"><a href="book-dataset-list.html"><i class="fa fa-check"></i><b>B</b> 教材数据列表</a><ul>
<li class="chapter" data-level="B.1" data-path="book-dataset-list.html"><a href="book-dataset-list.html#正文案例数据"><i class="fa fa-check"></i><b>B.1</b> 正文案例数据</a></li>
<li class="chapter" data-level="B.2" data-path="book-dataset-list.html"><a href="book-dataset-list.html#习题数据"><i class="fa fa-check"></i><b>B.2</b> 习题数据</a></li>
</ul></li>
<li class="divider"></li>
<li><a href="https://bookdown.org" target="blank">本书由 bookdown 强力驱动</a></li>

</ul>

      </nav>
    </div>

    <div class="book-body">
      <div class="body-inner">
        <div class="book-header" role="navigation">
          <h1>
            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">《属性数据分析》代码</a>
          </h1>
        </div>

        <div class="page-wrapper" tabindex="-1" role="main">
          <div class="page-inner">

            <section class="normal" id="section-">
<div id="glm" class="section level1">
<h1><span class="header-section-number">第 3 章</span> 广义线性模型</h1>
<div id="components-of-glm" class="section level2">
<h2><span class="header-section-number">3.1</span> 广义线性模型的构成部分</h2>
</div>
<div id="glm-for-binary-data" class="section level2">
<h2><span class="header-section-number">3.2</span> 二分数据的广义线性模型</h2>
<div id="打鼾与心脏病" class="section level3 unnumbered">
<h3>打鼾与心脏病</h3>
<div class="sourceCode" id="cb158"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb158-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb158-2" data-line-number="2"><span class="kw">data</span>(<span class="st">&quot;snoring_heartdisease&quot;</span>)</a>
<a class="sourceLine" id="cb158-3" data-line-number="3">snoring_heartdisease</a></code></pre></div>
<pre><code>##                     Heartdisease
## Snoring               Yes   No
##   Never                24 1355
##   Occasional           35  603
##   Nearly every night   21  192
##   Every night          30  224</code></pre>
<p>对打鼾数据（二分数据）拟合模型时，可以设定<code>family=binomial()</code>，其中<code>binomial()</code>的<code>link</code>参数为<code>identity</code>、<code>logit</code>、<code>probit</code>时，分别表示拟合线性概率模型、logistics模型和probit模型</p>
<p>以下使用打鼾频率得分0, 2, 4, 5拟合三个模型，并获取对应的预测概率</p>
<div class="sourceCode" id="cb160"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb160-1" data-line-number="1">scores &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="dv">0</span>, <span class="dv">2</span>, <span class="dv">4</span>, <span class="dv">5</span>)</a>
<a class="sourceLine" id="cb160-2" data-line-number="2"></a>
<a class="sourceLine" id="cb160-3" data-line-number="3">snoring_linear &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb160-4" data-line-number="4">  snoring_heartdisease <span class="op">~</span><span class="st"> </span>scores, <span class="dt">family =</span> <span class="kw">binomial</span>(<span class="dt">link =</span> <span class="st">&quot;identity&quot;</span>)</a>
<a class="sourceLine" id="cb160-5" data-line-number="5">)</a>
<a class="sourceLine" id="cb160-6" data-line-number="6">snoring_logistics &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb160-7" data-line-number="7">  snoring_heartdisease <span class="op">~</span><span class="st"> </span>scores, <span class="dt">family =</span> <span class="kw">binomial</span>(<span class="dt">link =</span> <span class="st">&quot;logit&quot;</span>)</a>
<a class="sourceLine" id="cb160-8" data-line-number="8">)</a>
<a class="sourceLine" id="cb160-9" data-line-number="9">snoring_probit &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb160-10" data-line-number="10">  snoring_heartdisease <span class="op">~</span><span class="st"> </span>scores, <span class="dt">family =</span> <span class="kw">binomial</span>(<span class="dt">link =</span> <span class="st">&quot;probit&quot;</span>)</a>
<a class="sourceLine" id="cb160-11" data-line-number="11">)</a>
<a class="sourceLine" id="cb160-12" data-line-number="12"></a>
<a class="sourceLine" id="cb160-13" data-line-number="13">model_list &lt;-<span class="st"> </span><span class="kw">list</span>(snoring_linear, snoring_logistics, snoring_probit)</a></code></pre></div>
<div class="sourceCode" id="cb161"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb161-1" data-line-number="1"><span class="co"># 模型系数</span></a>
<a class="sourceLine" id="cb161-2" data-line-number="2">estimated_coef &lt;-<span class="st"> </span><span class="kw">sapply</span>(model_list, coef)</a>
<a class="sourceLine" id="cb161-3" data-line-number="3"><span class="kw">colnames</span>(estimated_coef) &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="st">&quot;linear&quot;</span>, <span class="st">&quot;logit&quot;</span>, <span class="st">&quot;probit&quot;</span>)</a>
<a class="sourceLine" id="cb161-4" data-line-number="4"><span class="kw">round</span>(estimated_coef, <span class="dt">digits =</span> <span class="dv">3</span>)</a></code></pre></div>
<pre><code>##             linear  logit probit
## (Intercept)  0.017 -3.866 -2.061
## scores       0.020  0.397  0.188</code></pre>
<p>得到的三个模型为
<span class="math display">\[\hat{\pi}(x)=\hat{\alpha}+\hat{\beta}x=0.017+0.020x\]</span>
<span class="math display">\[\mathrm{logit}(\hat{\pi}(x))=\hat{\alpha}+\hat{\beta}x=-3.866+0.397x\]</span>
<span class="math display">\[\mathrm{probit}(\hat{\pi}(x))=\hat{\alpha}+\hat{\beta}x=-2.061+0.188x\]</span></p>
<div class="sourceCode" id="cb163"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb163-1" data-line-number="1"><span class="co"># 模型预测概率</span></a>
<a class="sourceLine" id="cb163-2" data-line-number="2">pred_prob &lt;-<span class="st"> </span><span class="kw">sapply</span>(model_list, predict, <span class="dt">type =</span> <span class="st">&quot;response&quot;</span>)</a>
<a class="sourceLine" id="cb163-3" data-line-number="3"><span class="kw">colnames</span>(pred_prob) &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="st">&quot;linear&quot;</span>, <span class="st">&quot;logit&quot;</span>, <span class="st">&quot;probit&quot;</span>)</a>
<a class="sourceLine" id="cb163-4" data-line-number="4"><span class="kw">round</span>(pred_prob, <span class="dt">digits =</span> <span class="dv">3</span>)</a></code></pre></div>
<pre><code>##                    linear logit probit
## Never               0.017 0.021  0.020
## Occasional          0.057 0.044  0.046
## Nearly every night  0.096 0.093  0.095
## Every night         0.116 0.132  0.131</code></pre>
<p>作图，在一张图中画出三个模型的图像</p>
<div class="sourceCode" id="cb165"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb165-1" data-line-number="1">snoring_new &lt;-<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">scores=</span><span class="kw">seq</span>(<span class="op">-</span><span class="dv">1</span>, <span class="dv">6</span>, <span class="fl">0.01</span>))</a>
<a class="sourceLine" id="cb165-2" data-line-number="2"><span class="kw">plot</span>(</a>
<a class="sourceLine" id="cb165-3" data-line-number="3">  <span class="ot">NULL</span>, </a>
<a class="sourceLine" id="cb165-4" data-line-number="4">  <span class="dt">xlim =</span> <span class="kw">c</span>(<span class="dv">0</span>, <span class="dv">5</span>), <span class="dt">ylim =</span> <span class="kw">c</span>(<span class="dv">0</span>, <span class="fl">0.2</span>),</a>
<a class="sourceLine" id="cb165-5" data-line-number="5">  <span class="dt">xlab =</span> <span class="st">&quot;Snoring&quot;</span>, <span class="dt">ylab =</span> <span class="st">&quot;Predicted Probability&quot;</span></a>
<a class="sourceLine" id="cb165-6" data-line-number="6">)</a>
<a class="sourceLine" id="cb165-7" data-line-number="7"></a>
<a class="sourceLine" id="cb165-8" data-line-number="8">line_col &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="dt">identity =</span> <span class="dv">2</span>, <span class="dt">logit =</span> <span class="dv">3</span>, <span class="dt">probit =</span> <span class="dv">5</span>)</a>
<a class="sourceLine" id="cb165-9" data-line-number="9"><span class="kw">sapply</span>(model_list, <span class="cf">function</span>(m) {</a>
<a class="sourceLine" id="cb165-10" data-line-number="10">    pred_result &lt;-<span class="st"> </span><span class="kw">predict</span>(m, snoring_new, <span class="dt">type =</span> <span class="st">&quot;response&quot;</span>)</a>
<a class="sourceLine" id="cb165-11" data-line-number="11">    <span class="kw">lines</span>(</a>
<a class="sourceLine" id="cb165-12" data-line-number="12">      snoring_new<span class="op">$</span>scores, pred_result, <span class="dt">type =</span> <span class="st">&quot;l&quot;</span>, </a>
<a class="sourceLine" id="cb165-13" data-line-number="13">      <span class="dt">lty =</span> <span class="dv">1</span>, <span class="dt">col =</span> line_col[m<span class="op">$</span>family<span class="op">$</span>link]</a>
<a class="sourceLine" id="cb165-14" data-line-number="14">    )</a>
<a class="sourceLine" id="cb165-15" data-line-number="15">  }</a>
<a class="sourceLine" id="cb165-16" data-line-number="16">)</a>
<a class="sourceLine" id="cb165-17" data-line-number="17"><span class="kw">legend</span>(<span class="dv">1</span>, <span class="fl">0.15</span>, <span class="kw">names</span>(line_col), <span class="dt">col =</span> line_col, <span class="dt">lty =</span> <span class="dv">1</span>)</a></code></pre></div>
<p><img src="cdacode_files/figure-html/unnamed-chunk-57-1.png" width="70%" style="display: block; margin: auto;" /></p>
</div>
</div>
<div id="glm-for-count-data" class="section level2">
<h2><span class="header-section-number">3.3</span> 计数数据的广义线性模型</h2>
<div id="母鲎及其追随者泊松glm" class="section level3 unnumbered">
<h3>母鲎及其追随者（泊松GLM）</h3>
<div class="sourceCode" id="cb166"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb166-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb166-2" data-line-number="2"><span class="kw">data</span>(<span class="st">&quot;horseshoecrabs&quot;</span>) <span class="co"># 母鲎及其追随者的数据集</span></a>
<a class="sourceLine" id="cb166-3" data-line-number="3"><span class="kw">head</span>(horseshoecrabs)</a></code></pre></div>
<pre><code>##   Color Spine Width Weight Satellites
## 1     2     3  28.3   3.05          8
## 2     3     3  22.5   1.55          0
## 3     1     1  26.0   2.30          9
## 4     3     3  24.8   2.10          0
## 5     3     3  26.0   2.60          4
## 6     2     3  23.8   2.10          0</code></pre>
<p>首先可以作出响应计数对宽度的图像，图中数字为对应点的观测数。</p>
<div class="sourceCode" id="cb168"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb168-1" data-line-number="1"><span class="kw">library</span>(dplyr)</a>
<a class="sourceLine" id="cb168-2" data-line-number="2">horseshoecrabs <span class="op">%&gt;%</span><span class="st"> </span></a>
<a class="sourceLine" id="cb168-3" data-line-number="3"><span class="st">  </span><span class="kw">group_by</span>(Satellites, Width) <span class="op">%&gt;%</span><span class="st"> </span></a>
<a class="sourceLine" id="cb168-4" data-line-number="4"><span class="st">  </span><span class="kw">summarise</span>(<span class="dt">n =</span> <span class="kw">n</span>()) <span class="op">%&gt;%</span><span class="st"> </span></a>
<a class="sourceLine" id="cb168-5" data-line-number="5"><span class="st">  </span><span class="kw">plot</span>(</a>
<a class="sourceLine" id="cb168-6" data-line-number="6">    Satellites <span class="op">~</span><span class="st"> </span>Width, <span class="dt">data =</span> .,</a>
<a class="sourceLine" id="cb168-7" data-line-number="7">    <span class="dt">pch =</span> <span class="kw">as.character</span>(n),  <span class="co"># 点类型设为数字</span></a>
<a class="sourceLine" id="cb168-8" data-line-number="8">    <span class="dt">xlab =</span> <span class="st">&quot;Width&quot;</span>, <span class="dt">ylab =</span> <span class="st">&quot;Number of Satellites&quot;</span>,  <span class="co"># 横纵坐标标签</span></a>
<a class="sourceLine" id="cb168-9" data-line-number="9">    <span class="dt">cex =</span> <span class="fl">0.8</span>  <span class="co"># 字体大小</span></a>
<a class="sourceLine" id="cb168-10" data-line-number="10">  )</a></code></pre></div>
<p><img src="cdacode_files/figure-html/unnamed-chunk-59-1.png" width="70%" style="display: block; margin: auto;" />
在使用该数据建模时，泊松对数线性模型可以通过在<code>glm</code>中设置<code>family=poisson</code>，在R里泊松回归的默认联系(<code>link</code>)为对数，因此在这里不需要修改<code>link</code></p>
<div class="sourceCode" id="cb169"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb169-1" data-line-number="1">m1 &lt;-<span class="st"> </span><span class="kw">glm</span>(Satellites <span class="op">~</span><span class="st"> </span>Width, <span class="dt">family =</span> <span class="kw">poisson</span>(), <span class="dt">data =</span> horseshoecrabs)</a>
<a class="sourceLine" id="cb169-2" data-line-number="2"><span class="kw">summary</span>(m1)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = Satellites ~ Width, family = poisson(), data = horseshoecrabs)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.853  -1.988  -0.493   1.097   4.922  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)   -3.305      0.542   -6.09  1.1e-09 ***
## Width          0.164      0.020    8.22  &lt; 2e-16 ***
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 567.88  on 171  degrees of freedom
## AIC: 927.2
## 
## Number of Fisher Scoring iterations: 6</code></pre>
<p>可以得出拟合的对数线性模型为<span class="math display">\[\log \hat{\mu}=\hat{\alpha}+\hat{\beta}x=-3.305+0.164x\]</span></p>
<p>而如果要拟合恒等联系的泊松模型，需要设置<code>poisson(link=&quot;identity&quot;)</code>。另外在此案例中，直接跑回归会出现如下错误：</p>
<pre><code>Error: no valid set of coefficients has been found: please supply starting values
In addition: Warning message:
In log(y/mu) : NaNs produced</code></pre>
<p>即需要指定寻找最优值过程的初值，否则有可能找不到解，这里可使用对数联系的泊松模型的系数作为初值</p>
<div class="sourceCode" id="cb172"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb172-1" data-line-number="1">m2 &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb172-2" data-line-number="2">  Satellites <span class="op">~</span><span class="st"> </span>Width, </a>
<a class="sourceLine" id="cb172-3" data-line-number="3">  <span class="dt">family =</span> <span class="kw">poisson</span>(<span class="dt">link =</span> <span class="st">&quot;identity&quot;</span>),  <span class="co"># 恒等联系的泊松模型</span></a>
<a class="sourceLine" id="cb172-4" data-line-number="4">  <span class="dt">data =</span> horseshoecrabs,</a>
<a class="sourceLine" id="cb172-5" data-line-number="5">  <span class="dt">start =</span> <span class="kw">coef</span>(m1)  <span class="co"># 使用m1的系数作为初值</span></a>
<a class="sourceLine" id="cb172-6" data-line-number="6">)</a>
<a class="sourceLine" id="cb172-7" data-line-number="7"><span class="kw">summary</span>(m2)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = Satellites ~ Width, family = poisson(link = &quot;identity&quot;), 
##     data = horseshoecrabs, start = coef(m1))
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.911  -1.960  -0.541   1.041   4.799  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept) -11.5255     0.6777   -17.0   &lt;2e-16 ***
## Width         0.5492     0.0297    18.5   &lt;2e-16 ***
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 557.71  on 171  degrees of freedom
## AIC: 917
## 
## Number of Fisher Scoring iterations: 22</code></pre>
<p>可得拟合出的模型为<span class="math display">\[\hat{\mu}=\hat{\alpha}+\hat{\beta}x=-11.525+0.549x\]</span></p>
<p>最后可以查看恒等联系和对数联系的模型估计的差别，即教材中的图3.6。</p>
<div class="sourceCode" id="cb174"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb174-1" data-line-number="1"><span class="co"># 需要先按宽度分组，再求各组的平均宽度和平均追随者只数</span></a>
<a class="sourceLine" id="cb174-2" data-line-number="2">mean_satellite_width &lt;-<span class="st"> </span>horseshoecrabs <span class="op">%&gt;%</span><span class="st"> </span></a>
<a class="sourceLine" id="cb174-3" data-line-number="3"><span class="st">  </span><span class="kw">mutate</span>(<span class="dt">width_group =</span> <span class="kw">cut</span>(Width, <span class="kw">c</span>(<span class="dv">0</span>, <span class="fl">23.25</span> <span class="op">+</span><span class="st"> </span><span class="dv">0</span><span class="op">:</span><span class="dv">6</span>, <span class="ot">Inf</span>), <span class="dt">dig.lab =</span> <span class="dv">4</span>)) <span class="op">%&gt;%</span><span class="st"> </span><span class="co"># 分区间</span></a>
<a class="sourceLine" id="cb174-4" data-line-number="4"><span class="st">  </span><span class="kw">group_by</span>(width_group) <span class="op">%&gt;%</span><span class="st"> </span><span class="co"># 声明按width_group进行分组</span></a>
<a class="sourceLine" id="cb174-5" data-line-number="5"><span class="st">  </span><span class="kw">summarise</span>(</a>
<a class="sourceLine" id="cb174-6" data-line-number="6">    <span class="dt">mean_width =</span> <span class="kw">mean</span>(Width),  <span class="co"># 平均宽度</span></a>
<a class="sourceLine" id="cb174-7" data-line-number="7">    <span class="dt">mean_satellite =</span> <span class="kw">mean</span>(Satellites)  <span class="co"># 平均追随者只数</span></a>
<a class="sourceLine" id="cb174-8" data-line-number="8">  )</a>
<a class="sourceLine" id="cb174-9" data-line-number="9"></a>
<a class="sourceLine" id="cb174-10" data-line-number="10"><span class="kw">plot</span>(</a>
<a class="sourceLine" id="cb174-11" data-line-number="11">  mean_satellite <span class="op">~</span><span class="st"> </span>mean_width, </a>
<a class="sourceLine" id="cb174-12" data-line-number="12">  <span class="dt">data =</span> mean_satellite_width, <span class="co"># 选定数据集</span></a>
<a class="sourceLine" id="cb174-13" data-line-number="13">  <span class="dt">pch =</span> <span class="dv">20</span>, <span class="co"># 点类型为实心圆点</span></a>
<a class="sourceLine" id="cb174-14" data-line-number="14">  <span class="dt">xlab =</span> <span class="st">&quot;Width&quot;</span>, <span class="dt">ylab =</span> <span class="st">&quot;Number of Satellites&quot;</span> <span class="co"># 横纵坐标标签</span></a>
<a class="sourceLine" id="cb174-15" data-line-number="15">)</a>
<a class="sourceLine" id="cb174-16" data-line-number="16"></a>
<a class="sourceLine" id="cb174-17" data-line-number="17">x &lt;-<span class="st"> </span><span class="kw">seq</span>(<span class="dv">22</span>, <span class="dv">32</span>, <span class="fl">0.1</span>)</a>
<a class="sourceLine" id="cb174-18" data-line-number="18">y_m1 &lt;-<span class="st"> </span><span class="kw">predict</span>(m1, <span class="kw">data.frame</span>(<span class="dt">Width =</span> x), <span class="dt">type =</span> <span class="st">&quot;response&quot;</span>)</a>
<a class="sourceLine" id="cb174-19" data-line-number="19">y_m2 &lt;-<span class="st"> </span><span class="kw">predict</span>(m2, <span class="kw">data.frame</span>(<span class="dt">Width =</span> x))</a>
<a class="sourceLine" id="cb174-20" data-line-number="20"><span class="kw">lines</span>(x, y_m1, <span class="dt">type =</span> <span class="st">&quot;l&quot;</span>, <span class="dt">col =</span> <span class="dv">2</span>)</a>
<a class="sourceLine" id="cb174-21" data-line-number="21"><span class="kw">lines</span>(x, y_m2, <span class="dt">type =</span> <span class="st">&quot;l&quot;</span>, <span class="dt">col =</span> <span class="dv">3</span>)</a>
<a class="sourceLine" id="cb174-22" data-line-number="22"><span class="kw">legend</span>(<span class="dv">28</span>, <span class="dv">2</span>, <span class="kw">c</span>(<span class="st">&quot;Log Link&quot;</span>, <span class="st">&quot;Identiry Link&quot;</span>), <span class="dt">col =</span> <span class="kw">c</span>(<span class="dv">2</span>, <span class="dv">3</span>), <span class="dt">lty =</span> <span class="dv">1</span>)</a></code></pre></div>
<p><img src="cdacode_files/figure-html/unnamed-chunk-62-1.png" width="70%" style="display: block; margin: auto;" /></p>
</div>
<div id="母鲎及其追随者负二项glm" class="section level3 unnumbered">
<h3>母鲎及其追随者（负二项GLM）</h3>
<p>负二项GLM与泊松GLM类似，但设定<code>glm</code>函数中的<code>family</code>参数时，R中并不自带负二项分布的<code>family</code>，需要使用<code>MASS</code>包中的<code>negative.binomial()</code>。该函数的默认<code>link</code>为对数，但需要额外指定一个参数<code>theta</code>，该参数的意义为教材3.3.4节中<code>D</code>的倒数。</p>
<p>需要指定<span class="math inline">\(\theta\)</span>的原因（应该）是<code>glm()</code>函数不带有寻找最优<span class="math inline">\(\theta\)</span>的过程，这里使用教材中最后得出的<span class="math inline">\(\widehat{D}=1.1\)</span>的倒数为<span class="math inline">\(\theta\)</span>的值。</p>
<div class="sourceCode" id="cb175"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb175-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb175-2" data-line-number="2"><span class="kw">library</span>(MASS)</a>
<a class="sourceLine" id="cb175-3" data-line-number="3">m1 &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb175-4" data-line-number="4">  Satellites <span class="op">~</span><span class="st"> </span>Width, </a>
<a class="sourceLine" id="cb175-5" data-line-number="5">  <span class="dt">family =</span> <span class="kw">negative.binomial</span>(<span class="dt">theta =</span> <span class="dv">1</span> <span class="op">/</span><span class="st"> </span><span class="fl">1.1</span>), </a>
<a class="sourceLine" id="cb175-6" data-line-number="6">  <span class="dt">data =</span> horseshoecrabs</a>
<a class="sourceLine" id="cb175-7" data-line-number="7">)</a>
<a class="sourceLine" id="cb175-8" data-line-number="8"><span class="kw">summary</span>(m1)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = Satellites ~ Width, family = negative.binomial(theta = 1/1.1), 
##     data = horseshoecrabs)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.782  -1.412  -0.251   0.478   2.022  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(&gt;|t|)    
## (Intercept)  -4.0514     1.0777   -3.76  0.00023 ***
## Width         0.1920     0.0405    4.74  4.5e-06 ***
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for Negative Binomial(0.9091) family taken to be 0.8495)
## 
##     Null deviance: 213.63  on 172  degrees of freedom
## Residual deviance: 196.33  on 171  degrees of freedom
## AIC: 755.3
## 
## Number of Fisher Scoring iterations: 5</code></pre>
<p>另一个更优的拟合负二项GLM的办法是使用<code>MASS</code>包中的<code>glm.nb()</code>，该函数带有寻找最优<span class="math inline">\(\theta\)</span>的过程，不需要指定<span class="math inline">\(\theta\)</span>参数。</p>
<div class="sourceCode" id="cb177"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb177-1" data-line-number="1">m2 &lt;-<span class="st"> </span><span class="kw">glm.nb</span>(Satellites <span class="op">~</span><span class="st"> </span>Width, <span class="dt">data =</span> horseshoecrabs)</a>
<a class="sourceLine" id="cb177-2" data-line-number="2"><span class="kw">summary</span>(m2)</a></code></pre></div>
<pre><code>## 
## Call:
## glm.nb(formula = Satellites ~ Width, data = horseshoecrabs, init.theta = 0.90456808, 
##     link = log)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.780  -1.411  -0.250   0.477   2.018  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)  -4.0525     1.1714   -3.46  0.00054 ***
## Width         0.1921     0.0441    4.36  1.3e-05 ***
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for Negative Binomial(0.9046) family taken to be 1)
## 
##     Null deviance: 213.05  on 172  degrees of freedom
## Residual deviance: 195.81  on 171  degrees of freedom
## AIC: 757.3
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.905 
##           Std. Err.:  0.161 
## 
##  2 x log-likelihood:  -751.291</code></pre>
</div>
<div id="英国的火车事故" class="section level3 unnumbered">
<h3>英国的火车事故</h3>
<div class="sourceCode" id="cb179"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb179-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb179-2" data-line-number="2"><span class="kw">library</span>(MASS)</a>
<a class="sourceLine" id="cb179-3" data-line-number="3"><span class="kw">data</span>(<span class="st">&quot;traincollisions&quot;</span>)</a>
<a class="sourceLine" id="cb179-4" data-line-number="4"><span class="kw">head</span>(traincollisions)</a></code></pre></div>
<pre><code>##   Year  KM Train TrRd
## 1 2003 518     0    3
## 2 2002 516     1    3
## 3 2001 508     0    4
## 4 2000 503     1    3
## 5 1999 505     1    2
## 6 1998 487     0    4</code></pre>
<p>根据3.5节，使用泊松glm拟合比率数据时，模型为<span class="math display">\[\log(\mu/t)=\log(\mu)-\log(t)=\alpha+\beta x\]</span></p>
<p>由于泊松glm的y需要是正整数，因此可以把对数比率（<span class="math inline">\(\log(\mu/t)\)</span>）化为两个对数相减（<span class="math inline">\(\log(\mu)-\log(t)\)</span>），再把<span class="math inline">\(\log(t)\)</span>作为<code>offset</code></p>
<p>对于<code>glm</code>函数，有一个参数<code>offset</code>可以直接设置</p>
<div class="sourceCode" id="cb181"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb181-1" data-line-number="1">traincollisions<span class="op">$</span>year0 &lt;-<span class="st"> </span>traincollisions<span class="op">$</span>Year <span class="op">-</span><span class="st"> </span><span class="dv">1975</span></a>
<a class="sourceLine" id="cb181-2" data-line-number="2">m_poisson &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb181-3" data-line-number="3">  TrRd <span class="op">~</span><span class="st"> </span>year0, </a>
<a class="sourceLine" id="cb181-4" data-line-number="4">  <span class="dt">data =</span> traincollisions, <span class="dt">family =</span> <span class="kw">poisson</span>(),</a>
<a class="sourceLine" id="cb181-5" data-line-number="5">  <span class="dt">offset =</span> <span class="kw">log</span>(traincollisions<span class="op">$</span>KM)</a>
<a class="sourceLine" id="cb181-6" data-line-number="6">)</a>
<a class="sourceLine" id="cb181-7" data-line-number="7"><span class="kw">summary</span>(m_poisson)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = TrRd ~ year0, family = poisson(), data = traincollisions, 
##     offset = log(traincollisions$KM))
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.058  -0.783  -0.083   0.377   3.387  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)  -4.2114     0.1589  -26.50   &lt;2e-16 ***
## year0        -0.0329     0.0108   -3.06   0.0022 ** 
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 47.376  on 28  degrees of freedom
## Residual deviance: 37.853  on 27  degrees of freedom
## AIC: 133.5
## 
## Number of Fisher Scoring iterations: 5</code></pre>
<p>得到的模型为<span class="math display">\[\log(\hat{\mu})-\log(t)=-4.2114-0.0329x\]</span></p>
<p>而负二项glm使用的<code>glm.nb()</code>函数没有<code>offset</code>参数，因此可利用<code>offset()</code>函数将其纳入<code>formula</code>中。</p>
<div class="sourceCode" id="cb183"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb183-1" data-line-number="1">m_nb &lt;-<span class="st"> </span><span class="kw">glm.nb</span>(</a>
<a class="sourceLine" id="cb183-2" data-line-number="2">  TrRd <span class="op">~</span><span class="st"> </span>year0 <span class="op">+</span><span class="st"> </span><span class="kw">offset</span>(<span class="kw">log</span>(KM)), </a>
<a class="sourceLine" id="cb183-3" data-line-number="3">  <span class="dt">data =</span> traincollisions</a>
<a class="sourceLine" id="cb183-4" data-line-number="4">)</a>
<a class="sourceLine" id="cb183-5" data-line-number="5"><span class="kw">summary</span>(m_nb)</a></code></pre></div>
<pre><code>## 
## Call:
## glm.nb(formula = TrRd ~ year0 + offset(log(KM)), data = traincollisions, 
##     init.theta = 10.11828724, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7237  -0.6546  -0.0587   0.3298   2.6407  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)  -4.2000     0.1958  -21.45   &lt;2e-16 ***
## year0        -0.0337     0.0129   -2.61   0.0089 ** 
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for Negative Binomial(10.12) family taken to be 1)
## 
##     Null deviance: 32.045  on 28  degrees of freedom
## Residual deviance: 25.264  on 27  degrees of freedom
## AIC: 132.7
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  10.12 
##           Std. Err.:  8.00 
## 
##  2 x log-likelihood:  -126.69</code></pre>
<p>得到模型为<span class="math display">\[\log(\hat{\mu})-\log(t)=-4.2000-0.0337x\]</span></p>
</div>
</div>
<div id="stat-infer-glm" class="section level2">
<h2><span class="header-section-number">3.4</span> 统计推断和模型检验</h2>
<div id="打鼾与心脏病-1" class="section level3 unnumbered">
<h3>打鼾与心脏病</h3>
<div class="sourceCode" id="cb185"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb185-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb185-2" data-line-number="2"><span class="kw">data</span>(<span class="st">&quot;snoring_heartdisease&quot;</span>)</a>
<a class="sourceLine" id="cb185-3" data-line-number="3">scores &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="dv">0</span>, <span class="dv">2</span>, <span class="dv">4</span>, <span class="dv">5</span>)</a>
<a class="sourceLine" id="cb185-4" data-line-number="4">snoring_linear &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb185-5" data-line-number="5">  snoring_heartdisease <span class="op">~</span><span class="st"> </span>scores, </a>
<a class="sourceLine" id="cb185-6" data-line-number="6">  <span class="dt">family =</span> <span class="kw">binomial</span>(<span class="dt">link =</span> <span class="st">&quot;identity&quot;</span>)</a>
<a class="sourceLine" id="cb185-7" data-line-number="7">)</a>
<a class="sourceLine" id="cb185-8" data-line-number="8"><span class="kw">summary</span>(snoring_linear)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = snoring_heartdisease ~ scores, family = binomial(link = &quot;identity&quot;))
## 
## Deviance Residuals: 
##              Never          Occasional  Nearly every night  
##             0.0448             -0.2132              0.1101  
##        Every night  
##             0.0980  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)  0.01725    0.00345    5.00  5.8e-07 ***
## scores       0.01978    0.00280    7.05  1.8e-12 ***
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 65.904481  on 3  degrees of freedom
## Residual deviance:  0.069191  on 2  degrees of freedom
## AIC: 24.32
## 
## Number of Fisher Scoring iterations: 3</code></pre>
<div class="sourceCode" id="cb187"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb187-1" data-line-number="1"><span class="kw">anova</span>(snoring_linear, <span class="dt">test =</span> <span class="st">&quot;Chisq&quot;</span>)</a></code></pre></div>
<pre><code>## Analysis of Deviance Table
## 
## Model: binomial, link: identity
## 
## Response: snoring_heartdisease
## 
## Terms added sequentially (first to last)
## 
## 
##        Df Deviance Resid. Df Resid. Dev Pr(&gt;Chi)    
## NULL                       3       65.9             
## scores  1     65.8         2        0.1  4.9e-16 ***
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1</code></pre>
<p>可使用<code>confint()</code>来得到likelihood ratio置信区间，但需要先得到载入<code>MASS</code>包</p>
<div class="sourceCode" id="cb189"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb189-1" data-line-number="1"><span class="kw">library</span>(MASS)</a>
<a class="sourceLine" id="cb189-2" data-line-number="2"><span class="kw">confint</span>(snoring_linear)</a></code></pre></div>
<pre><code>##               2.5 %  97.5 %
## (Intercept) 0.01133 0.02483
## scores      0.01452 0.02551</code></pre>
</div>
</div>
<div id="fit-glm" class="section level2">
<h2><span class="header-section-number">3.5</span> 广义线性模型的拟合</h2>
</div>
<div id="problems-ch3" class="section level2 unnumbered">
<h2>课后题</h2>
<div id="第3题" class="section level3 unnumbered">
<h3>第3题</h3>
<ol style="list-style-type: lower-alpha">
<li>列出预测等式:<span class="math display">\[y=0.0025+0.0011*alcohol\]</span></li>
</ol>
<p>斜率0.0011表示，饮酒水平每增加1，婴儿畸形的概率上升0.0011</p>
<ol start="2" style="list-style-type: lower-alpha">
<li><span class="math display">\[y_1=0.0025+0.0011\times 0=0.0025\]</span>
<span class="math display">\[y_2=0.0025+0.0011\times 7=0.0102\]</span></li>
</ol>
<p>饮酒量为0时，畸形的概率为0.0025；饮酒量为7.0时，畸形的概率为0.0102,；相对风险0.245</p>
</div>
<div id="第4题-1" class="section level3 unnumbered">
<h3>第4题</h3>
<ol style="list-style-type: lower-alpha">
<li>敏感</li>
</ol>
<p>首先拟合第三题中的模型</p>
<p>注意拟合线性概率模型时使用的矩阵需要第一列表示“成功”的数量，所以这里数据的两列需要先互换</p>
<div class="sourceCode" id="cb191"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb191-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb191-2" data-line-number="2"><span class="kw">data</span>(<span class="st">&quot;malformation&quot;</span>)</a>
<a class="sourceLine" id="cb191-3" data-line-number="3">alcohol_score &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="dv">0</span>, <span class="fl">0.5</span>, <span class="fl">1.5</span>, <span class="dv">4</span>, <span class="dv">7</span>)  <span class="co"># 饮酒量得分</span></a>
<a class="sourceLine" id="cb191-4" data-line-number="4"><span class="co"># 更换顺序两列的顺序</span></a>
<a class="sourceLine" id="cb191-5" data-line-number="5"><span class="co"># 拟合线性概率模型时使用的矩阵需要第一列表示“成功”的数量</span></a>
<a class="sourceLine" id="cb191-6" data-line-number="6">malformation0 &lt;-<span class="st"> </span>malformation[, <span class="dv">2</span><span class="op">:</span><span class="dv">1</span>]</a>
<a class="sourceLine" id="cb191-7" data-line-number="7">malformation0</a></code></pre></div>
<pre><code>##        Malformation
## Alcohol Present Absent
##     0        48  17066
##     &lt;1       38  14464
##     1-2       5    788
##     3-5       1    126
##     &gt;=6       1     37</code></pre>
<div class="sourceCode" id="cb193"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb193-1" data-line-number="1">m_problem3 &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb193-2" data-line-number="2">  malformation0 <span class="op">~</span><span class="st"> </span>alcohol_score, <span class="dt">family =</span> <span class="kw">binomial</span>(<span class="st">&quot;identity&quot;</span>)</a>
<a class="sourceLine" id="cb193-3" data-line-number="3">)</a>
<a class="sourceLine" id="cb193-4" data-line-number="4"><span class="kw">summary</span>(m_problem3)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = malformation0 ~ alcohol_score, family = binomial(&quot;identity&quot;))
## 
## Deviance Residuals: 
##      0      &lt;1     1-2     3-5     &gt;=6  
##  0.656  -1.049   0.863   0.130   0.828  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)   0.002548   0.000352    7.23  4.8e-13 ***
## alcohol_score 0.001087   0.000832    1.31     0.19    
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 6.2020  on 4  degrees of freedom
## Residual deviance: 2.9795  on 3  degrees of freedom
## AIC: 25.61
## 
## Number of Fisher Scoring iterations: 10</code></pre>
<p>删除一个观测重新拟合</p>
<div class="sourceCode" id="cb195"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb195-1" data-line-number="1">malformation1 &lt;-<span class="st"> </span>malformation0</a>
<a class="sourceLine" id="cb195-2" data-line-number="2">malformation1[<span class="dv">5</span>, <span class="dv">1</span>] &lt;-<span class="st"> </span><span class="dv">0</span></a>
<a class="sourceLine" id="cb195-3" data-line-number="3">malformation1</a></code></pre></div>
<pre><code>##        Malformation
## Alcohol Present Absent
##     0        48  17066
##     &lt;1       38  14464
##     1-2       5    788
##     3-5       1    126
##     &gt;=6       0     37</code></pre>
<div class="sourceCode" id="cb197"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb197-1" data-line-number="1">m_problem4a &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb197-2" data-line-number="2">  malformation1 <span class="op">~</span><span class="st"> </span>alcohol_score, <span class="dt">family =</span> <span class="kw">binomial</span>(<span class="st">&quot;identity&quot;</span>)</a>
<a class="sourceLine" id="cb197-3" data-line-number="3">)</a>
<a class="sourceLine" id="cb197-4" data-line-number="4"><span class="kw">summary</span>(m_problem4a)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = malformation1 ~ alcohol_score, family = binomial(&quot;identity&quot;))
## 
## Deviance Residuals: 
##      0      &lt;1     1-2     3-5     &gt;=6  
##  0.430  -0.791   1.127   0.369  -0.738  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)   0.002635   0.000352    7.48  7.5e-14 ***
## alcohol_score 0.000672   0.000785    0.86     0.39    
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3.7225  on 4  degrees of freedom
## Residual deviance: 2.7609  on 3  degrees of freedom
## AIC: 23.41
## 
## Number of Fisher Scoring iterations: 6</code></pre>
<div class="sourceCode" id="cb199"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb199-1" data-line-number="1"><span class="kw">cbind</span>(</a>
<a class="sourceLine" id="cb199-2" data-line-number="2">  <span class="st">`</span><span class="dt">model in problem 3</span><span class="st">`</span>=<span class="kw">coef</span>(m_problem3), </a>
<a class="sourceLine" id="cb199-3" data-line-number="3">  <span class="st">`</span><span class="dt">model in problem 4(a)</span><span class="st">`</span>=<span class="kw">coef</span>(m_problem4a)</a>
<a class="sourceLine" id="cb199-4" data-line-number="4">)</a></code></pre></div>
<pre><code>##               model in problem 3 model in problem 4(a)
## (Intercept)             0.002548             0.0026346
## alcohol_score           0.001087             0.0006716</code></pre>
<p>饮酒量的参数从0.00109下降到了0.00067，说明模型对那个观测敏感。</p>
<ol start="2" style="list-style-type: lower-alpha">
<li>敏感</li>
</ol>
<p>更换饮酒量得分，重新拟合</p>
<div class="sourceCode" id="cb201"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb201-1" data-line-number="1">alcohol_score_4b &lt;-<span class="st"> </span><span class="dv">0</span><span class="op">:</span><span class="dv">4</span></a>
<a class="sourceLine" id="cb201-2" data-line-number="2">m_problem4b &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb201-3" data-line-number="3">  malformation0 <span class="op">~</span><span class="st"> </span>alcohol_score_4b, <span class="dt">family =</span> <span class="kw">binomial</span>(<span class="st">&quot;identity&quot;</span>)</a>
<a class="sourceLine" id="cb201-4" data-line-number="4">)</a>
<a class="sourceLine" id="cb201-5" data-line-number="5"><span class="kw">summary</span>(m_problem4b)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = malformation0 ~ alcohol_score_4b, family = binomial(&quot;identity&quot;))
## 
## Deviance Residuals: 
##      0      &lt;1     1-2     3-5     &gt;=6  
##  0.525  -1.072   1.145   0.588   1.360  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)      0.002598   0.000380    6.84  7.8e-12 ***
## alcohol_score_4b 0.000504   0.000528    0.96     0.34    
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 6.2020  on 4  degrees of freedom
## Residual deviance: 4.9336  on 3  degrees of freedom
## AIC: 27.56
## 
## Number of Fisher Scoring iterations: 9</code></pre>
<div class="sourceCode" id="cb203"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb203-1" data-line-number="1"><span class="kw">cbind</span>(</a>
<a class="sourceLine" id="cb203-2" data-line-number="2">  <span class="st">`</span><span class="dt">model in problem 3</span><span class="st">`</span>=<span class="kw">coef</span>(m_problem3), </a>
<a class="sourceLine" id="cb203-3" data-line-number="3">  <span class="st">`</span><span class="dt">model in problem 4(b)</span><span class="st">`</span>=<span class="kw">coef</span>(m_problem4b)</a>
<a class="sourceLine" id="cb203-4" data-line-number="4">)</a></code></pre></div>
<pre><code>##               model in problem 3 model in problem 4(b)
## (Intercept)             0.002548             0.0025977
## alcohol_score           0.001087             0.0005044</code></pre>
<div class="sourceCode" id="cb205"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb205-1" data-line-number="1">alcohol_score_4b <span class="op">/</span><span class="st"> </span>alcohol_score</a></code></pre></div>
<pre><code>## [1]    NaN 2.0000 1.3333 0.7500 0.5714</code></pre>
<p>饮酒量的参数大概只是原来的一半，而饮酒量得分只有当饮酒量小于1时才是原来的两倍。</p>
<div class="sourceCode" id="cb207"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb207-1" data-line-number="1"><span class="kw">cbind</span>(</a>
<a class="sourceLine" id="cb207-2" data-line-number="2">  <span class="st">`</span><span class="dt">pred-prob in 3</span><span class="st">`</span> =<span class="st"> </span><span class="kw">predict</span>(m_problem3)[<span class="kw">c</span>(<span class="dv">1</span>, <span class="dv">5</span>)],</a>
<a class="sourceLine" id="cb207-3" data-line-number="3">  <span class="st">`</span><span class="dt">pred-prob in 4(b)</span><span class="st">`</span> =<span class="st"> </span><span class="kw">predict</span>(m_problem4b)[<span class="kw">c</span>(<span class="dv">1</span>, <span class="dv">5</span>)]</a>
<a class="sourceLine" id="cb207-4" data-line-number="4">)</a></code></pre></div>
<pre><code>##     pred-prob in 3 pred-prob in 4(b)
## 0         0.002548          0.002598
## &gt;=6       0.010158          0.004615</code></pre>
<p>预测出来的饮酒量也有很大差异（在最大饮酒量时）。</p>
<ol start="3" style="list-style-type: lower-alpha">
<li>拟合logit和probit模型</li>
</ol>
<div class="sourceCode" id="cb209"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb209-1" data-line-number="1">m_problem4c_logit &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb209-2" data-line-number="2">  malformation0 <span class="op">~</span><span class="st"> </span>alcohol_score, </a>
<a class="sourceLine" id="cb209-3" data-line-number="3">  <span class="dt">family =</span> <span class="kw">binomial</span>(<span class="st">&quot;logit&quot;</span>)</a>
<a class="sourceLine" id="cb209-4" data-line-number="4">)</a>
<a class="sourceLine" id="cb209-5" data-line-number="5">m_problem4c_probit &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb209-6" data-line-number="6">  malformation0 <span class="op">~</span><span class="st"> </span>alcohol_score, </a>
<a class="sourceLine" id="cb209-7" data-line-number="7">  <span class="dt">family =</span> <span class="kw">binomial</span>(<span class="st">&quot;probit&quot;</span>)</a>
<a class="sourceLine" id="cb209-8" data-line-number="8">)</a>
<a class="sourceLine" id="cb209-9" data-line-number="9"></a>
<a class="sourceLine" id="cb209-10" data-line-number="10"></a>
<a class="sourceLine" id="cb209-11" data-line-number="11"><span class="kw">cbind</span>(</a>
<a class="sourceLine" id="cb209-12" data-line-number="12">  <span class="st">`</span><span class="dt">linear</span><span class="st">`</span> =<span class="st"> </span><span class="kw">coef</span>(m_problem3), </a>
<a class="sourceLine" id="cb209-13" data-line-number="13">  <span class="st">`</span><span class="dt">logit</span><span class="st">`</span> =<span class="st"> </span><span class="kw">coef</span>(m_problem4c_logit),</a>
<a class="sourceLine" id="cb209-14" data-line-number="14">  <span class="st">`</span><span class="dt">probit</span><span class="st">`</span> =<span class="st"> </span><span class="kw">coef</span>(m_problem4c_probit)</a>
<a class="sourceLine" id="cb209-15" data-line-number="15">)</a></code></pre></div>
<pre><code>##                 linear   logit  probit
## (Intercept)   0.002548 -5.9605 -2.7996
## alcohol_score 0.001087  0.3166  0.1098</code></pre>
<p>从而三个模型为</p>
<p><span class="math display">\[\hat{\pi}(x)=\hat{\alpha}+\hat{\beta}x=0.00255+0.00109x\]</span>
<span class="math display">\[\mathrm{logit}(\hat{\pi}(x))=\hat{\alpha}+\hat{\beta}x=-5.96046+0.31656x\]</span>
<span class="math display">\[\mathrm{probit}(\hat{\pi}(x))=\hat{\alpha}+\hat{\beta}x=-2.79961+0.10979x\]</span></p>
<p>三个模型饮酒量的系数都为正，表明随着饮酒程度的增加，婴儿畸形概率增大</p>
</div>
<div id="第7题" class="section level3 unnumbered">
<h3>第7题</h3>
<p>在做题前需要先按题目要求，构造出Y变量，当鲎的追随者大于0时，Y=1，否则Y=0</p>
<div class="sourceCode" id="cb211"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb211-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb211-2" data-line-number="2"><span class="kw">data</span>(horseshoecrabs)</a>
<a class="sourceLine" id="cb211-3" data-line-number="3"><span class="co"># psat即为题目中的Y</span></a>
<a class="sourceLine" id="cb211-4" data-line-number="4">horseshoecrabs<span class="op">$</span>psat &lt;-<span class="st"> </span><span class="kw">as.integer</span>(horseshoecrabs<span class="op">$</span>Satellites <span class="op">&gt;</span><span class="st"> </span><span class="dv">0</span>)</a></code></pre></div>
<ol style="list-style-type: lower-alpha">
<li>OLS估计可直接使用<code>lm()</code>函数</li>
</ol>
<div class="sourceCode" id="cb212"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb212-1" data-line-number="1">m1 &lt;-<span class="st"> </span><span class="kw">lm</span>(psat <span class="op">~</span><span class="st"> </span>Weight, <span class="dt">data =</span> horseshoecrabs)</a>
<a class="sourceLine" id="cb212-2" data-line-number="2"><span class="kw">summary</span>(m1)</a></code></pre></div>
<pre><code>## 
## Call:
## lm(formula = psat ~ Weight, data = horseshoecrabs)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -0.888 -0.468  0.161  0.370  0.669 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(&gt;|t|)    
## (Intercept)  -0.1449     0.1472   -0.98     0.33    
## Weight        0.3227     0.0588    5.49  1.4e-07 ***
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## Residual standard error: 0.445 on 171 degrees of freedom
## Multiple R-squared:  0.15,   Adjusted R-squared:  0.145 
## F-statistic: 30.2 on 1 and 171 DF,  p-value: 1.42e-07</code></pre>
<p>得到模型为<span class="math display">\[Y=-0.155+0.323weight\]</span></p>
<p>从而体重每增加1kg，有追随者的概率就增加0.323。</p>
<p>使用该模型预测当重量为5.2kg时有追随者的概率</p>
<div class="sourceCode" id="cb214"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb214-1" data-line-number="1"><span class="kw">predict</span>(m1, <span class="dt">newdata =</span> <span class="kw">data.frame</span>(<span class="dt">Weight =</span> <span class="fl">5.2</span>))</a></code></pre></div>
<pre><code>##     1 
## 1.533</code></pre>
<p>概率大于1，这说明这个线性模型并不好。</p>
<ol start="2" style="list-style-type: lower-alpha">
<li>尝试使用ML方法估计线性概率模型，会像这样报错</li>
</ol>
<div class="sourceCode" id="cb216"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb216-1" data-line-number="1">m2 &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb216-2" data-line-number="2">  psat <span class="op">~</span><span class="st"> </span>Weight, <span class="dt">data =</span> horseshoecrabs, <span class="dt">family =</span> <span class="kw">binomial</span>(<span class="dt">link =</span> <span class="st">&quot;identity&quot;</span>)</a>
<a class="sourceLine" id="cb216-3" data-line-number="3">)</a></code></pre></div>
<pre><code>## Error: no valid set of coefficients has been found: please supply starting values</code></pre>
<p>报错原因是拟合的概率落到了<span class="math inline">\((0, 1)\)</span>之外</p>
<ol start="3" style="list-style-type: lower-alpha">
<li></li>
</ol>
<div class="sourceCode" id="cb218"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb218-1" data-line-number="1"><span class="co"># 拟合logistic模型</span></a>
<a class="sourceLine" id="cb218-2" data-line-number="2">m3 &lt;-<span class="st"> </span><span class="kw">glm</span>(psat <span class="op">~</span><span class="st"> </span>Weight, <span class="dt">data =</span> horseshoecrabs, <span class="dt">family =</span> <span class="kw">binomial</span>())</a>
<a class="sourceLine" id="cb218-3" data-line-number="3"><span class="kw">summary</span>(m3)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = psat ~ Weight, family = binomial(), data = horseshoecrabs)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.111  -1.075   0.543   0.912   1.629  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)   -3.695      0.880   -4.20  2.7e-05 ***
## Weight         1.815      0.377    4.82  1.4e-06 ***
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.76  on 172  degrees of freedom
## Residual deviance: 195.74  on 171  degrees of freedom
## AIC: 199.7
## 
## Number of Fisher Scoring iterations: 4</code></pre>
<div class="sourceCode" id="cb220"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb220-1" data-line-number="1"><span class="co"># weight为5.2kg时的logit值</span></a>
<a class="sourceLine" id="cb220-2" data-line-number="2"><span class="kw">predict</span>(m3, <span class="dt">newdata =</span> <span class="kw">data.frame</span>(<span class="dt">Weight =</span> <span class="fl">5.2</span>))</a></code></pre></div>
<pre><code>##     1 
## 5.744</code></pre>
<div class="sourceCode" id="cb222"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb222-1" data-line-number="1"><span class="kw">log</span>(<span class="fl">0.9968</span> <span class="op">/</span><span class="st"> </span>(<span class="dv">1</span> <span class="op">-</span><span class="st"> </span><span class="fl">0.9968</span>))</a></code></pre></div>
<pre><code>## [1] 5.741</code></pre>
</div>
<div id="第20题" class="section level3 unnumbered">
<h3>第20题</h3>
<ol style="list-style-type: lower-alpha">
<li></li>
</ol>
<div class="sourceCode" id="cb224"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb224-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb224-2" data-line-number="2"><span class="kw">data</span>(<span class="st">&quot;smoking_cd&quot;</span>)</a>
<a class="sourceLine" id="cb224-3" data-line-number="3"><span class="kw">ftable</span>(smoking_cd, <span class="dt">row.vars =</span> <span class="st">&quot;Age&quot;</span>, <span class="dt">col.vars =</span> <span class="kw">c</span>(<span class="st">&quot;Item&quot;</span>, <span class="st">&quot;Smoking&quot;</span>))</a></code></pre></div>
<pre><code>##       Item    Person-Years         Coronary Deaths        
##       Smoking   Nonsmokers Smokers      Nonsmokers Smokers
## Age                                                       
## 35-44                18793   52407               2      32
## 45-54                10673   43248              12     104
## 55-64                 5710   28612              28     206
## 65-74                 2585   12663              28     186
## 75-84                 1462    5317              31     102</code></pre>
<p>计算死亡率，以及吸烟和不吸烟的死亡率比例</p>
<div class="sourceCode" id="cb226"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb226-1" data-line-number="1">death_rate &lt;-<span class="st"> </span>smoking_cd[, , <span class="dv">2</span>] <span class="op">/</span><span class="st"> </span>smoking_cd[, , <span class="dv">1</span>] <span class="op">*</span><span class="st"> </span><span class="dv">1000</span></a>
<a class="sourceLine" id="cb226-2" data-line-number="2">death_rate[, <span class="dv">2</span>] <span class="op">/</span><span class="st"> </span>death_rate[, <span class="dv">1</span>]</a></code></pre></div>
<pre><code>##  35-44  45-54  55-64  65-74  75-84 
## 5.7376 2.1388 1.4682 1.3561 0.9047</code></pre>
<p>可以看出，不管是不是吸烟者，冠心病死亡率都随着年龄而增长。同时，吸烟的影响随着年龄的增长而减小。</p>
<ol start="2" style="list-style-type: lower-alpha">
<li><p>主效应模型假设了吸烟的影响不取决于年龄。这是不合理的，从(a)中可以明显看出吸烟的影响随着年龄而变化。</p></li>
<li><p>从(a)中可以看出吸烟的影响随着年龄增长而递减，从而我们可以给年龄赋予适当的得分来使其成为定量变量。基于此，我们可以考虑吸烟和年龄的定量交互。</p></li>
</ol>
<p>而模型为</p>
<p><span class="math display">\[\log\Big(\frac{deathnum}{personnum}\Big)=\alpha+\beta_1 \mathrm{age}+\beta_2\mathrm{smoking}+\beta_3\mathrm{age}\times \mathrm{smoking}\]</span></p>
<p>其中年龄为定量变量。对于吸烟者，<span class="math inline">\(\mathrm{smoking}=1\)</span>，则</p>
<p><span class="math display">\[\log\Big(\frac{deathnum}{personnum}\Big)=(\alpha+\beta_2)+(\beta_1+\beta_3)\mathrm{age}\]</span></p>
<p>对于非吸烟者，<span class="math inline">\(\mathrm{smoking}=0\)</span>，则</p>
<p><span class="math display">\[\log\Big(\frac{deathnum}{personnum}\Big)=\alpha+\beta_1\mathrm{age}\]</span></p>
<p>这两个都是线性模型</p>
<ol start="4" style="list-style-type: lower-alpha">
<li>拟合模型前需要先变换数据结构</li>
</ol>
<div class="sourceCode" id="cb228"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb228-1" data-line-number="1"><span class="kw">library</span>(tidyr)</a>
<a class="sourceLine" id="cb228-2" data-line-number="2">smoking_cd_df &lt;-<span class="st"> </span><span class="kw">spread</span>(<span class="kw">as.data.frame</span>(smoking_cd), Item, Freq)</a></code></pre></div>
<p>首先拟合(b)中的主效应模型</p>
<div class="sourceCode" id="cb229"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb229-1" data-line-number="1">m1 &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb229-2" data-line-number="2">  <span class="st">`</span><span class="dt">Coronary Deaths</span><span class="st">`</span> <span class="op">~</span><span class="st"> </span>Age <span class="op">+</span><span class="st"> </span>Smoking,</a>
<a class="sourceLine" id="cb229-3" data-line-number="3">  <span class="dt">offset =</span> <span class="kw">log</span>(<span class="st">`</span><span class="dt">Person-Years</span><span class="st">`</span>),</a>
<a class="sourceLine" id="cb229-4" data-line-number="4">  <span class="dt">data =</span> smoking_cd_df,</a>
<a class="sourceLine" id="cb229-5" data-line-number="5">  <span class="dt">family =</span> <span class="kw">poisson</span>()</a>
<a class="sourceLine" id="cb229-6" data-line-number="6">)</a>
<a class="sourceLine" id="cb229-7" data-line-number="7"><span class="kw">summary</span>(m1)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = `Coronary Deaths` ~ Age + Smoking, family = poisson(), 
##     data = smoking_cd_df, offset = log(`Person-Years`))
## 
## Deviance Residuals: 
##       1        2        3        4        5        6  
## -2.1800   0.9018  -1.3080   0.5104  -0.1379   0.0513  
##       7        8        9       10  
##  0.2289  -0.0873   1.9191  -0.9124  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)      -7.919      0.192  -41.30  &lt; 2e-16 ***
## Age45-54          1.484      0.195    7.61  2.8e-14 ***
## Age55-64          2.628      0.184   14.30  &lt; 2e-16 ***
## Age65-74          3.351      0.185   18.13  &lt; 2e-16 ***
## Age75-84          3.700      0.192   19.25  &lt; 2e-16 ***
## SmokingSmokers    0.355      0.107    3.30  0.00096 ***
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 935.091  on 9  degrees of freedom
## Residual deviance:  12.134  on 4  degrees of freedom
## AIC: 79.2
## 
## Number of Fisher Scoring iterations: 4</code></pre>
<p>接着给5个年龄组分别赋予得分1, 2, 3, 4, 5并拟合(c)中的模型</p>
<div class="sourceCode" id="cb231"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb231-1" data-line-number="1">smoking_cd_df<span class="op">$</span>age_score &lt;-<span class="st"> </span><span class="kw">as.numeric</span>(smoking_cd_df<span class="op">$</span>Age)</a>
<a class="sourceLine" id="cb231-2" data-line-number="2">m2 &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb231-3" data-line-number="3">  <span class="st">`</span><span class="dt">Coronary Deaths</span><span class="st">`</span> <span class="op">~</span><span class="st"> </span>age_score <span class="op">*</span><span class="st"> </span>Smoking,</a>
<a class="sourceLine" id="cb231-4" data-line-number="4">  <span class="dt">offset =</span> <span class="kw">log</span>(<span class="st">`</span><span class="dt">Person-Years</span><span class="st">`</span>),</a>
<a class="sourceLine" id="cb231-5" data-line-number="5">  <span class="dt">data =</span> smoking_cd_df,</a>
<a class="sourceLine" id="cb231-6" data-line-number="6">  <span class="dt">family =</span> <span class="kw">poisson</span>()</a>
<a class="sourceLine" id="cb231-7" data-line-number="7">)</a>
<a class="sourceLine" id="cb231-8" data-line-number="8"><span class="kw">summary</span>(m2)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = `Coronary Deaths` ~ age_score * Smoking, family = poisson(), 
##     data = smoking_cd_df, offset = log(`Person-Years`))
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -3.878  -2.122  -0.248   1.718   3.527  
## 
## Coefficients:
##                          Estimate Std. Error z value
## (Intercept)               -8.8672     0.3057  -29.01
## age_score                  1.0468     0.0774   13.52
## SmokingSmokers             1.2837     0.3258    3.94
## age_score:SmokingSmokers  -0.2490     0.0836   -2.98
##                          Pr(&gt;|z|)    
## (Intercept)               &lt; 2e-16 ***
## age_score                 &lt; 2e-16 ***
## SmokingSmokers            8.2e-05 ***
## age_score:SmokingSmokers   0.0029 ** 
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 935.091  on 9  degrees of freedom
## Residual deviance:  59.895  on 6  degrees of freedom
## AIC: 123
## 
## Number of Fisher Scoring iterations: 4</code></pre>
<p>前一个模型有更小的residual deviance和AIC，看起来似乎前一个模型更好。</p>
<p>出现这种情况的原因可能是吸烟的影响不是随着年龄而线性变化的。</p>

</div>
</div>
</div>
            </section>

          </div>
        </div>
      </div>
<a href="contingency-table.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
<a href="logistic-regression.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
    </div>
  </div>
<script src="libs/gitbook/js/app.min.js"></script>
<script src="libs/gitbook/js/lunr.js"></script>
<script src="libs/gitbook/js/plugin-search.js"></script>
<script src="libs/gitbook/js/plugin-sharing.js"></script>
<script src="libs/gitbook/js/plugin-fontsettings.js"></script>
<script src="libs/gitbook/js/plugin-bookdown.js"></script>
<script src="libs/gitbook/js/jquery.highlight.js"></script>
<script>
gitbook.require(["gitbook"], function(gitbook) {
gitbook.start({
"sharing": {
"github": true,
"facebook": false,
"twitter": false,
"google": false,
"linkedin": false,
"weibo": false,
"instapper": false,
"vk": false,
"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
},
"fontsettings": {
"theme": "white",
"family": "sans",
"size": 2
},
"edit": {
"link": "https://github.com/jinzhen-lin/cdacode-document/edit/master/03-glm.Rmd",
"text": "编辑"
},
"download": ["cdacode.pdf", "cdacode.epub", "cdacode.zip"],
"toc": {
"collapse": "section"
}
});
});
</script>

<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
  (function () {
    var script = document.createElement("script");
    script.type = "text/javascript";
    var src = "true";
    if (src === "" || src === "true") src = "https://cdn.bootcss.com/mathjax/2.7.1/MathJax.js?config=TeX-MML-AM_CHTML";
    if (location.protocol !== "file:" && /^https?:/.test(src))
      src = src.replace(/^https?:/, '');
    script.src = src;
    document.getElementsByTagName("head")[0].appendChild(script);
  })();
</script>
</body>

</html>
